# load libraries ----
# reporting
library(knitr)
# visualization
library(ggplot2)
library(ggthemes)
library(patchwork)
library(viridis)
# remote sensing & GIS
library(sf)
library(raster)
library(ggspatial)
library(rnaturalearth)
library(rnaturalearthhires)
library(rnaturalearthdata)
# data wrangling
library(plyr)
library(dplyr)
library(tidyr)
library(tibble)
library(readr)
library(skimr)
library(janitor)
library(glue)
library(magrittr)
library(lubridate)
# stats
library(corrplot)
library(Hmisc)
# set other options ----
options(scipen=999)
knitr::opts_chunk$set(
tidy = FALSE,
message = FALSE,
warning = FALSE)34 Patterns of Topography & Vegetation I
Learning Objectives
After completing this tutorial you should be able to
- understand the importance of long-term monitoring stations and baseline data
- understand how spatial and temporal patterns affect ecological processes
- describe what raster data sets are and how they encode spatial information
- read spatial raster data in R, making simple maps, and extracting information for non-spatial statistical tests.
- test whether plant growth (greenness & height) is drive more by elevation, slope, or aspect.
- consider large scale patterns of relationships between topography and vegetation
Download the directory for this project here, make sure the directory is unzipped and move it to your bi328 directory. You can open the Rproj for this module either by double clicking on it which will launch Rstudio or by opening Rstudio and then using File > Open Project or by clicking on the Rproject icon in the top right of your program window and selecting Open Project.
There should be a file named 34_vegetation-I.qmd in that project directory. Use that file to work through this tutorial - you will hand in your rendered (“knitted”) quarto file as your homework assignment. So, first thing in the YAML header, change the author to your name. You will use this quarto document to record your answers. Remember to use comments to annotate your code; at minimum you should have one comment per code set1 you may of course add as many comments as you need to be able to recall what you did. Similarly, take notes in the document as we discuss discussion/reflection questions but make sure that you go back and clean them up for “public consumption”.
1 You should do this whether you are adding code yourself or using code from our manual, even if it isn’t commented in the manual… especially when the code is already included for you, add comments to describe how the function works/what it does as we introduce it during the participatory coding session so you can refer back to it.
Before you get started, let’s make sure to read in all the packages we will need - take a look and install any that you are still missing!
34.1 NEON & the power of longterm data sets
34.2 Plot maps of topography and vegetation
The data set we are going to start out with is from a NEON station in the Sierra Nevada mountains in California from a region called Soaproot Saddle.
You already have several *.tif files in your data folder. These are raster data sets with information on NDVI and topography that were sampled for this NEON site in 2018.
First we will load Digital Terrain Model data (DTM) which was obtained using LiDAR.
dtm <- raster("data/NEON_D17_SOAP_DP3_298000_4100000_DTM.tif")
class(dtm)[1] "RasterLayer"
attr(,"package")
[1] "raster"
Next, we will read in the Digital Surface Model LiDAR raster data set.
dsm <- raster("data/NEON_D17_SOAP_DP3_298000_4100000_DSM.tif")We already have information on elevation, but let’s calculate slope and aspect as additional metrics to describe the topography.
We will use degrees as a measure for slope based on the DTM raster data set.
slope <- terrain(dtm, opt = "slope", unit = "degrees", neighbors = 8)To determine aspect, we will calculate the “northness” as the cosine of aspect, which will read in radians2.
2 Radian is the SI unit for measuring angles; it is a dimensionless value.
# calculate aspect
aspect <- terrain(dtm, opt = "aspect", unit = "radians", neighbors = 8)
# calculate northness
north <- cos(aspect)Now we have everything we need to describe the topography of the monitoring site using elevation, slope, and aspect.
Our overarching question is the extent to which topography impacts vegetation. We will use two variables to describe “levels of vegetation”, first the Normalized difference vegetation index (NDVI) and second vegetation height.
Spectral imaging was used to determine the Normalized Difference vegetation index (NDVI) for our monitoring station.
ndvi <- raster("data/NEON_D17_SOAP_DP3_298000_4100000_NDVI.tif")We can calculate the vegetation height as DSM - DTM.
veg_ht <- dsm - dtmCurrently, we have several individual raster objects, each forming a different layer describing topography and vegetation for the same locations.
We can combine all these individual layers into a raster stack. This produces a list in which each element is a different raster layer. Doing this has some advantages, for example, it is straight forward to make a single plot combining these different layers.
# create raster stack
all_data <- stack(dtm, dsm, slope, north, ndvi, veg_ht)
# rename raster layers
names(all_data) <- c("DTM", "DSM", "slope", "north", "NDVI", "Vegetation.Height")Now we can create maps for each of our raster files3. We will use the function plot() instead of our usual ggplot options to keep it simple.
3 This may take a second to plot … you are processing a bunch of data!
plot(all_data,
col = viridis(255))Let’s consider what correlations we would expect to find based on these six maps.
34.3 Explore relationships between topographic variable and vegetation patterns
First, lets convert our raster layers into data frames for easier use. We’ll start by creating an empty data frame with as many rows as there are cells in our raster objects. Then we can use the function extract() pulling out the information for each pixel and put it into a set of columns using the extent argument, which tells R to pull out all the information in a specific raster layer and turn it into a vector4.
4 Recall, each column of a data.frame is a vector
# create and empty data frame
df <- as.data.frame(matrix(NA, nrow = ncell(dtm), ncol=0))
# extract vegetation height
df$veg_ht <- raster::extract(veg_ht, extent(veg_ht))
# extract ndvi
df$ndvi <- raster::extract(ndvi, extent(ndvi))
# extract dtm
df$dtm <- raster::extract(dtm, extent(dtm))
# extract slope
df$slope <- raster::extract(slope, extent(slope))
# extract aspect
df$north <- raster::extract(north, extent(north))
head(df) veg_ht ndvi dtm slope north
1 7.8100586 0.8477020 1297.25 NaN NaN
2 8.9599609 0.8142292 1297.36 NaN NaN
3 7.5100098 0.8118778 1297.39 NaN NaN
4 0.7000732 0.8314815 1297.35 NaN NaN
5 0.8100586 0.7325861 1297.32 NaN NaN
6 0.0000000 0.6941839 1297.20 NaN NaN
Note, that we no longer have spatial information (coordinates), however, values in the same row do correspond to the same pixel location in our raster object.
Let’s go ahead and remove all rows that contain a missing value for one parameter.
df <- df %>%
filter(!is.na(slope)) %>%
filter(!is.na(north))
head(df) veg_ht ndvi dtm slope north
1 6.46008301 0.8252837 1297.20 11.99227 -0.8944847
2 6.85998535 0.8504076 1297.03 18.01359 -0.9609535
3 0.61999512 0.6705710 1297.00 17.02542 -0.9999917
4 0.00000000 0.6165509 1296.93 14.48724 -0.9770750
5 0.00000000 0.5876359 1296.89 14.64421 -0.9805627
6 0.03991699 0.5609888 1296.90 18.33933 -0.8936710
Because we have so much data, we will take a 1% subset of the data to decrease the computational power and time needed.
Because our data set is so large (1 Million rows!), we would expect that a random subset is representative of the relationships as a whole.
dplyr::slice_sample() can be used to specify the proportion of rows that you would like to retain. The function will return a random sub sample.
# set seed for reproducibility
set.seed(42)
# randomly select 1% of rows
df_sub <- df %>%
slice_sample(prop = 0.01)Now, let’s take a look at the relationship between the vegetation and topographic variables.
To do this efficiently using the tidyverse principles and some ggplot magic, need to pivot our data set. Ultimately, we want to have a data set with one column with our topography parameters, one with the topography measurements, one with vegetation variables and one with measurements; then we will be able to use facet_grid() to plot all combinations of variables.
df_plot <- df_sub %>%
pivot_longer(names_to = "topog_param", values_to = "topog_meas", 3:5) %>%
pivot_longer(names_to = "veget_param", values_to = "veget_meas", 1:2)
ggplot(df_plot, aes(x = topog_meas, y = veget_meas)) +
geom_point(alpha = 0.25, size = .75) +
facet_grid(veget_param ~ topog_param, scales = "free") +
labs(x = "Topography", y = "Vegetation") +
theme_facetLet’s determine whether these variables are correlated using the Pearson correlation coefficients. We can use Hmisc::rcorr to calculate all pairwise correlations and test whether the relationships are significant.
P_corr <- rcorr(as.matrix(df), type="pearson")The output is a list. Recall that we can access individual components of a list using $, similar the way we access the individual vectors comprising a data.frame. In this case P_corr$r contains the correlation coefficients, and P_corr$p contains the p-values.
Visualizing correlations in a correlation plot can be helpful for a quick overview when you are working with a lot of different values, we can do this using the function corrplot().
corrplot(P_corr$r)Alternatively, you can print the correlation coefficients as a matrix.
P_corr$r veg_ht ndvi dtm slope north
veg_ht 1.00000000 0.15271745 -0.10739489 0.054293139 0.049763864
ndvi 0.15271745 1.00000000 -0.09762246 0.204216049 0.031199918
dtm -0.10739489 -0.09762246 1.00000000 0.038739447 0.086963361
slope 0.05429314 0.20421605 0.03873945 1.000000000 -0.005534425
north 0.04976386 0.03119992 0.08696336 -0.005534425 1.000000000
Let’s also print the corresponding p-values:
P_corr$P veg_ht ndvi dtm slope north
veg_ht NA 0 0 0.00000000000000 0.00000000000000
ndvi 0 NA 0 0.00000000000000 0.00000000000000
dtm 0 0 NA 0.00000000000000 0.00000000000000
slope 0 0 0 NA 0.00000003325176
north 0 0 0 0.00000003325176 NA
Let’s coerce our correlation coefficients into something a bit more tidy. while we’re at it we can also add the abbreviation for the site we’ve been looking at. Let’s write that data frame out as a text file in our results folder so we can access our results down the line.
tidy_cor <- as.data.frame(P_corr$r) %>%
rownames_to_column("Param1") %>%
pivot_longer(names_to = "Param2", values_to = "pearson", 2:6) %>%
filter(Param1 %in% c("veg_ht", "ndvi")) %>%
filter(!Param2 %in% c("veg_ht", "ndvi")) %>%
mutate(Site = "SOAP")
write_delim(tidy_cor, "results/SOAP_correlation.txt", delim = "\t")34.4 Acknowledgments
These activities are based on the EDDIE Remote Sensing module.5
5 Dahlin, K. M. (2021). Remote Sensing of Plants and Topography in R (Project EDDIE).